Research on discovering deep web entries

نویسندگان

  • Ying Wang
  • Huilai Li
  • Wanli Zuo
  • Fengling He
  • Xin Wang
  • Kerui Chen
چکیده

Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently locating Domain-Specific Deep Web databases based on focused crawling and ontology by constructing Web Page Classifier(WPC), Form Structure Classifier(FSC) and Form Content Classifier(FCC) in a hierarchical fashion. Firstly, WPC discovers potentially interesting pages based on ontology-assisted focused crawler. Then, FSC analyzes the interesting pages and determines whether these pages subsume searchable forms based on structural characteristics. Lastly, FCC identifies searchable forms that belong to a given domain in the semantic level, and stores these URLs of DomainSpecific searchable forms to a database. Through a detailed experimental evaluation, WFF framework not only simplifies discovering process, but also effectively determines Domain-Specific databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Discovering Associations from the Annotated Biological Web

During the last decade, biomedical researchers gained access to the entire human genome, reliable high-throughput biotechnologies, and affordable computational resources and network access. In combination, these new tools created a new model for biomedical research that no longer uses computational tools merely to monitor research, but instead exploits these tools to acquire knowledge and make ...

متن کامل

Focused Crawling of the Deep Web Using Service Class Descriptions

Dynamic Web data sources—sometimes known collectively as the Deep Web—increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size ...

متن کامل

Service Class Driven Dynamic Data Source Discovery with DynaBot

Dynamic Web data sources – sometimes known collectively as the Deep Web – increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the s...

متن کامل

Hidden Web Indexing Using HDDI Framework

There are various methods of indexing the hidden web database like novel indexing, distributed indexing or indexing using map reduce framework. Our goal is to find an optimized indexing technique keeping in mind the various factors like searching, distribute database, updating of web, etc. Here, we propose an optimized method for indexing the hidden web database. This research uses Hierarchical...

متن کامل

Discovering the Biomedical Deep Web

The rapid growth of biomedical information in the Deep Web has produced unprecedented challenges for traditional search engines. This paper describes a new Deep web resource discovery system for biomedical information. We designed two hypertext mining applications: a Focused Crawler that selectively seeks out relevant pages using a classifier that evaluates the relevance of the document with re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Comput. Sci. Inf. Syst.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2011